This assignment is for ETC5521 Assignment 1 by Team Dugong comprising of Abhishek Sinha and Yezi He.
House Price Index (HPI) is a broad measure to understand the movement of the single-family house prices across a designated market. HPI provides housing economists with an improved analytical tool that is useful for estimating changes in the rates of mortgage defaults, prepayments and housing affordability in specific geographic areas. Economists and Fund Managers often use HPI to analyse long term trends in consumer behavior and financial situation of the country. It is for this exact reason currency investors keep a watch on HPI along with Consumer Price Index (CPI), Gross Domestic Product (GDP) and unemployment figures. Experts say that HPI can predict inflation. With proper lender assistance, HPIs can also help you decide if it’s a good time to purchase a new home.
The housing market represents about 15 percent to 18 percent of U.S. GDP, which means a weak or strong housing market can have substantial influence on the direction of the overall economy. This makes HPI even more important to track and use in analysis.
The motivation of this report is to learn about the behavior of House Price Index in the U.S. and how it is affected by economy and the market. Through our analysis we will try to identify any relation between HPI at national level and at state level. The report will also analyse the effect of the external factors like recession and mortgage rates on HPI.
At present, the House Price Index is a relatively less known economic indicator which is soon gaining popularity among economists. Although this report is limited to establish HPI’s as an alternative economic factor to understand recession and its relation with mortgage rates, analyzing HPI in greater detail will allow for more informative view of the state of an economy and HPI’s relation with other factors like Currency fluctuation and GDP. As the strength of housing market also dictates the state of the workforce involved in this sector, relating HPI with population and workforce data will provide another interesting relation to analyse and predict the state of workforce and income during recessions.
In order to perform this analysis a real time data source is used to get an updated information for the HPI values which is also seasonally adjusted along with an updated mortgage rates. In this case it is Freddie Mac. To get information about the prominent recession period in the U.S., web scrapping is used to scrape data from Wikipedia.
In the U.S., the House Price Index is published by the Federal Housing Finance Agency (FHFA), using data supplied by the Federal National Mortgage Association (FNMA), typically known as Fannie Mae, and Federal Home Loan Mortgage Corp. (FHLMC), commonly known as Freddie Mac.
Freddie Mac publishes the monthly index values of the Freddie Mac House Price Index (FMHPI) each quarter. Index values are available for the nation, the 50 states and the District of Columbia, and the more than 380 metropolitan statistical areas (MSAs) in the U.S. The FMHPI is constructed using a repeat transactions methodology, which has become a common practice in housing research. The FMHPI is estimated with data including transactions on single-family detached and town home properties serving as collateral on loans originated between January 1, 1975, and the end of the most recent index month, where the loan has been purchased by Freddie Mac or Fannie Mae.
Repeat transactions indices measure price appreciation while holding constant property type and location, by comparing the price of the same property over two or more transactions. By construction, therefore, the repeat transaction requirement excludes new homes. The change in price of a given property measures the underlying rate of appreciation because basic factors such as physical location, climate, housing type, etc., are constant between transactions.
Significant renovation or deterioration to a property may, however, lead to higher or lower appreciation, respectively, than what results from the underlying price change of the property as it existed in the first transaction. The methodology attempts to identify and exclude such outlier properties that may influence the index away from a more accurate estimate of average property appreciation rates.
Further the calculation of HPI also involves appraisal values for some refinance decisions. Refinance adjustment terms are included in the regression specification to account for the possibility that appraisal values might systematically differ from purchase prices. These additional transactions greatly increase the power of estimation, especially at smaller levels of geography.
Thus the primary differences between the FMHPI and other indices are the inclusion of some appraisal values used for refinance transactions, the choice of geographic weights, the method for identifying outliers, and the use of statistical smoothing to more efficiently estimate indices at finer geographic levels.
The HPI (or FMHPI) SA raw data set, consists of both state level HPI and national HPI in a xls format. To use it for analysis in R, which is the preferred tool for analysis in this report, the raw data is further cleaned by breaking variable like month into two separate variables year and month, updating datatypes and removing data with missing year. This data set has been maintained by Freddie Mac from January’1975 onwards. For this report the data is taken till November’2018.
The second data set is of Freddie Mac’s Mortgage rates. Mortgage rates are an important factor that can influence the home buyer’s decision. Freddie Mac maintains an extensive data set for mortgage rates consisting of different types of mortgage like ‘Fixed rate 30 year mortgage’ , ‘Fixed rate 15 year mortgage’ and ‘5-1 Hybrid Adjustable rate mortgage’. Analyzing mortgage rates for HPI is also important they are the most common type of personal loan held by households. In other words, Buying a home with a mortgage is probably the largest financial transaction you will enter into.
Mortgage rates are mainly of two types, fixed and adjustable. Fixed rate mortgage is a fixed interest rate loan whereas interest rates changes under defined conditions in adjustable or hybrid loan.
Fixed-Rate Mortgage: The monthly payment remains the same for the life of this loan. The interest rate is locked in and does not change. Loans have a repayment life span of 30 years; shorter lengths of 10, 15 or 20 years are also commonly available. Shorter loans will have larger monthly payments that are offset by lower interest rates and lower overall cost.
Adjustable-Rate Mortgage (ARM): Because the interest rate is not locked in, the monthly payment for this type of loan will change over the life of the loan. ARMs can be attractive if you are planning on staying in your home for only a few years. It is important to consider how often the interest rate will adjust. For example, a five-to-one-year ARM has a fixed rate for five years, then every year the interest rate will adjust for the remainder of the loan period.
This data set has been maintained by Freddie Mac from April’1971 onwards. Freddie Mac has surveyed lenders across the nation weekly to determine the average 30-year fixed-rate mortgage rate; in 1984, the 1-year ARM was added to the survey and the 15-year fixed-rate mortgage rate was included beginning in 1991. In January 2005, Freddie Mac added a 5/1 hybrid ARM series to the survey. Survey reminder emails are sent out on Mondays and lenders are asked to respond by close of business Wednesday.
The final data set is a data set created with information scrapped from Wikipedia on major recession periods in US like the Great Recession in 2007. These events had some major effects across many industries and markets and are an important factor in analyzing the house prices overtime. Recession can slow down the market, increase unemployment which leads to loss of income and falling wages which ultimately reduces the spending power of potential home buyers.
For a broad overview of the trend in House Price Index in U.S. we start with looking at the average HPI of U.S. from 1975 onwards at national level.
In this we want to figure out what factors can affect House Price Index (HPI) and how they affect HPI. We get the final answer by analyzing three questions. The first is the relationship between mortgage rates and HPI. The second question is how the recession affected HPI. The third question is how the HPI value of the state is related to the HPI value of the country.
[FILL] Should include at least one plot or numerical summary for each of your questions, that helps the reader arrive at an answer. You should also write paragraphs describing the methods, summaries and findings.
Below we can look at the HPI dataset consisting of both US average HPI as ‘us_avg’ and state level HPI as ‘price_index’.
| year | month | us_avg | state | price_index |
|---|---|---|---|---|
| 1975 | 1 | 23.67495 | AK | 34.75510 |
| 1975 | 2 | 23.83883 | AK | 35.18865 |
| 1975 | 3 | 24.06335 | AK | 35.55284 |
| 1975 | 4 | 24.33704 | AK | 35.91469 |
| 1975 | 5 | 24.47008 | AK | 36.32859 |
Below we can look at the Mortgage Data.
| date | fixed_rate_30_yr | fees_and_pts_30_yr | fixed_rate_15_yr | fees_and_pts_15_yr | adjustable_rate_5_1_hybrid | fees_and_pts_5_1_hybrid | adjustable_margin_5_1_hybrid | spread_30_yr_and_fixed_5_1_adjustable | delete |
|---|---|---|---|---|---|---|---|---|---|
| 1971-04-02 | 7.33 | NA | NA | NA | NA | NA | NA | NA | NA |
| 1971-04-09 | 7.31 | NA | NA | NA | NA | NA | NA | NA | NA |
| 1971-04-16 | 7.31 | NA | NA | NA | NA | NA | NA | NA | NA |
| 1971-04-23 | 7.31 | NA | NA | NA | NA | NA | NA | NA | NA |
| 1971-04-30 | 7.29 | NA | NA | NA | NA | NA | NA | NA | NA |
recession_dates <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-02-05/recessions.csv")
recession_dates %>%
head(1) %>%
kable(caption = "Recession Data") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| name | period_range | duration_months | time_since_previous_recession_months | peak_unemploy_ment | gdp_decline_peak_to_trough | characteristics |
|---|---|---|---|---|---|---|
| Great Depression | Aug 1929-Mar 1933ct 1929-Dec 1941 | 433 years7 months | 0211 year9 months | 24.921.3%(1932)[45]– 24.9%(1933)[46] | 26.7−26.7% | A banking panic and a collapse in the money supply took place in the United States that was exacerbated by international commitment to the gold standard.[47][48][49]Extensive new tariffs and other factors contributed to an extremely deep depression.[50] GDP, industrial production, employment, and prices fell substantially. The economy began to recover in the mid 30s with gold inflow expanding the money supply and improving expectations but double dipped during the Recession of 1937-38. The ultimate recovery has been credited to monetary policy and monetary expansion.[51] |
As part of analysis we need to have the HPI data in a time series format. Using the year and month information a time series HPI dataset is created.
To start with the initial part of the primary question this report focuses on i.e. to learn about the behavior of House Price Index in the U.S. we plot the time series data of HPI values.
Figure 3.1: Figure 1
From Figure 1, we can look at the U.S. Average HPI from 1975 onwards and we noticed that HPI has shown a positive growth overtime. This means that the prices of single houses is increasing inspite of recessions. There is a gradual increase in the HPI values in the U.S. till 2000, which is followed by a period of relatively fast increase in the HPI. This can be explain by the concept of Housing Bubble. The U.S. experienced a major housing bubble in the 2000s caused by inflows of money into housing markets, loose lending conditions, and government policy to promote home-ownership. This temporary bubble came crashing down in 2007 with the start of the great recession which saw HPI values nose dive. As the economy started to rise again in 2011 we see the HPI values rise again. This correlation between HPI and economy only reaffirms the effectiveness of HPI as an economic factor.
The first sub question we look at is the relation between national HPI and state HPI. As the national HPI is driven by the states it becomes important to look at which states are the driving force behind it and which states are struggling. To analyse the HPI values across all 51 states is quite an arduous approach. This prompted an idea to instead look at the regions and compare it with national values. To do that we use a built in R dataset as mentioned in GIT which contains U.S. states information.
## state region
## 1 AL South
## 2 AK West
## 3 AZ West
## 4 AR South
## 5 CA West
## 6 CO West
## 7 CT Northeast
## 8 DE South
## 9 FL South
## 10 GA South
## 11 HI West
## 12 ID West
## 13 IL North Central
## 14 IN North Central
## 15 IA North Central
## 16 KS North Central
## 17 KY South
## 18 LA South
## 19 ME Northeast
## 20 MD South
## 21 MA Northeast
## 22 MI North Central
## 23 MN North Central
## 24 MS South
## 25 MO North Central
## 26 MT West
## 27 NE North Central
## 28 NV West
## 29 NH Northeast
## 30 NJ Northeast
## 31 NM West
## 32 NY Northeast
## 33 NC South
## 34 ND North Central
## 35 OH North Central
## 36 OK South
## 37 OR West
## 38 PA Northeast
## 39 RI Northeast
## 40 SC South
## 41 SD North Central
## 42 TN South
## 43 TX South
## 44 UT West
## 45 VT Northeast
## 46 VA South
## 47 WA West
## 48 WV South
## 49 WI North Central
## 50 WY West
Figure 3.2: Figure 2
From Figure 2, we can look at the four regions across the U.S. and how the HPI values at state level behaves w.r.t national level. We can notice that the West and Northeast regions are relatively on the same path as national HPI. These are the two regions which have states with strong economies and higher population because of the tech industries. States like California in West and New York or Connecticut in Northeast are some examples of that. For Regions like South and North Central which are comparatively smaller economies and less populated we noticed that they also witnessed a gradual increase in HPI but the effect of housing bubble and great recession was less drastic. This shows that the effect of factors like recession and economics varies around states and thus cannot be explain by one single national HPI.
The second sub question looks at the effect of Mortgage rates on HPI. Mortgage rates and HPI are two separate entities which are caluclated on different parameters but their relation can give interesting insights. For our analysis we focus on the ‘Fixed 30 year rate’ mortgage as this is more popular and has been from begninning to offer complete picture.
Figure 3.3: Figure 3
Looking at Figure 3 we can notice that as the House Price Index increased overtime, surprisingly mortgage rates have come crashing down. This means that banks and lenders like Freddie Mac are offering housing loans at lower interest rates ever. As people buy homes using mortgages most of the time, this means that financing houses has become easier. This indicates that the bank has enough reserves to dish out mortgage loans at such low rates. It ultimately means the economy has an actively circulating wealth in the system. Thus it can clearly said that mortgage rates have less impact on the HPI values.
To further stem this finding we try to build a rlm model to verify the relationship.
Figure 3.4: Figure 4
##
## Call:
## lm(formula = us_avg ~ lag(`mortgage rates`), data = hpi_mort)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.83 -17.22 -0.46 16.31 65.49
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 182.3953 3.2283 56.5 <2e-16 ***
## lag(`mortgage rates`) -11.0087 0.3694 -29.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 28.03 on 525 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.6285, Adjusted R-squared: 0.6278
## F-statistic: 888.3 on 1 and 525 DF, p-value: < 2.2e-16
Amalyzing at the summary statistics of the model, the slope in our model is -11.034, which indicates that for every 1 unit increase in the mortgage rate, the US Average HPI decreases by 11.034. An adjusted R-squared of 62.73% indicates the model is fitting the data well, and roughly 62.73% of the variance found in the response variable (US average HPI) can be explained by the mortgage rates. p-value of 2.2e-16 which is less than 0.01 represents that this model is significant on the 0.1% level.
The third sub question is to analyze the effect of recession on HPI values. Analyzing the effect of recession on HPI using economic data is outside the scope of this report, but the data scripted from Wikipedia allows us to look at the behaviour of HPI during the period of recession.
| name | from | to |
|---|---|---|
| Great Depression | 1929-08-01 | 1933-03-01 |
| Recession of 1937–1938 | 1937-05-01 | 1938-06-01 |
| Recession of 1945 | 1945-02-01 | 1945-10-01 |
| Recession of 1949 | 1948-11-01 | 1949-10-01 |
| Recession of 1953 | 1953-07-01 | 1954-05-01 |
| Recession of 1958 | 1957-08-01 | 1958-04-01 |
| Recession of 1960–61 | 1960-04-01 | 1961-02-01 |
| Recession of 1969–70 | 1969-12-01 | 1970-11-01 |
| 1973–75 recession | 1973-11-01 | 1975-03-01 |
| 1980 recession | 1980-01-01 | 1980-07-01 |
| 1981–1982 recession | 1981-07-01 | 1982-11-01 |
| Early 1990s recession in the United States | 1990-07-01 | 1991-03-01 |
| Early 2000s recession | 2001-03-01 | 2001-11-01 |
| Great Recession | 2007-12-01 | 2009-06-01 |
Figure 3.5: Figure 5
From Figure 5, we can look at the U.S. national HPI values overtime with periods of rescession represented in the form of vertical line. This in a sense indicates that before great recession of 2007, recession did not had much impact on HPI. Even the great recession doesnot clearly explains the relation between recession and HPI.
(#fig:p3_mortgage_recession)Figure 6
Adding mortgage rates to the Figure 5, it is clear that 1980 Recession and 1981-1982 Recession have a strong connection to the mortgage rates in this period.
Thus in the end this analysis tries to answer the primary question we started with and based on the analysis it can said that House Price Index in the U.S. has increased overtime and has mainly remain independent of the external factors like recession and mortgage rates.
The following packages are used to produce this report: tidyverse (Wickham, Averick, et al. 2019), lubridate (Grolemund and Wickham 2011), stringr (Wickham 2019b), kableExtra (Zhu 2019), readxl (Wickham and Bryan 2019), rvest (Wickham 2019a), janitor (Firke 2020), dplyr (Wickham, François, et al. 2020) , plotly (Sievert 2020), MASS(???), scales(Wickham and Seidel 2020), broom(Robinson and Hayes 2020)
Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Robinson, David, and Alex Hayes. 2020. Broom: Convert Statistical Analysis Objects into Tidy Tibbles. https://CRAN.R-project.org/package=broom.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Wickham, Hadley. 2019a. Rvest: Easily Harvest (Scrape) Web Pages. https://CRAN.R-project.org/package=rvest.
———. 2019b. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2020. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Dana Seidel. 2020. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.